---
title: Machine Learning Analytics with Zeppelin
---

[Apache Zeppelin](http://zeppelin-project.org/) is an interactive computational
environment built on Apache Spark like the IPython Notebook. With [Apache
PredictionIO (incubating)](http://predictionio.incubator.apache.org) and [Spark
SQL](https://spark.apache.org/sql/), you can easily analyze your collected
events when you are developing or tuning your engine.

## Prerequisites

The following instructions assume that you have the command `sbt` accessible in
your shell's search path. Alternatively, you can use the `sbt` command that
comes with Apache PredictionIO (incubating) at `$PIO_HOME/sbt/sbt`.

<%= partial 'shared/datacollection/parquet' %>

## Building Zeppelin for Apache Spark 1.2+

Start by cloning Zeppelin.

```
$ git clone https://github.com/apache/incubator-zeppelin.git
```

Build Zeppelin with Hadoop 2.4 and Spark 1.2 profiles.

```
$ cd zeppelin
$ mvn clean package -Pspark-1.2 -Dhadoop.version=2.4.0 -Phadoop-2.4 -DskipTests
```

Now you should have working Zeppelin binaries.

## Preparing Zeppelin

First, start Zeppelin.

```
$ bin/zeppelin-daemon.sh start
```

By default, you should be able to access Zeppelin via web browser at
http://localhost:8080. Create a new notebook and put the following in the first
cell.

```scala
sqlc.parquetFile("/tmp/movies").registerTempTable("events")
```

![Preparing Zeppelin](/images/datacollection/zeppelin-01.png)

## Performing Analysis with Zeppelin

If all steps above ran successfully, you should have a ready-to-use analytics
environment by now. Let's try a few examples to see if everything is functional.

In the second cell, put in this piece of code and run it.

```
%sql
SELECT entityType, event, targetEntityType, COUNT(*) AS c FROM events
GROUP BY entityType, event, targetEntityType
```

![Summary of Events](/images/datacollection/zeppelin-02.png)

We can also easily plot a pie chart.

```
%sql
SELECT event, COUNT(*) AS c FROM events GROUP BY event
```

![Summary of Event in Pie Chart](/images/datacollection/zeppelin-03.png)

And see a breakdown of rating values.

```
%sql
SELECT properties.rating AS r, COUNT(*) AS c FROM events
WHERE properties.rating IS NOT NULL GROUP BY properties.rating ORDER BY r
```

![Breakdown of Rating Values](/images/datacollection/zeppelin-04.png)

Happy analyzing!
